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Abstract — The problem of reconstructing a source sequence 
with the presence of decoder side-information that is mis- 
synchronized to the source due to deletions is studied in a 
distributed source coding framework. Motivated by practical 
applications, the deletion process is assumed to be bursty and 
is modeled by a Markov chain. The minimum rate needed 
to reconstruct the source sequence with high probability is 
characterized in terms of an information theoretic expression, 
which is interpreted as the amount of information of the deleted 
content and the locations of deletions, subtracting "nature's 
secret", that is, the uncertainty of the locations given the source 
and side-information. For small bursty deletion probability, the 
asymptotic expansion of the minimum rate is computed. 

I. Introduction 

In distributed file backup or file sharing systems, different 
source nodes may have different versions of the same file 
differing by a small number of edits including deletions and 
insertions. The edits usually appear in bursts, for example, a 
paragraph of text is deleted, or several consecutive frames of 
video are inserted. An important question is: how to efficiently 
send a file to a remote node that has a different version of it? 
Further, what is the fundamental limit of the number of bits 
that needs to be sent to achieve this goal? 
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Fig. 1. Synchronizing source sequences based on deletion side- 
information 

In this paper, we study the problem of reconstructing a 
source sequence with the help of decoder side-information 
using a distributed source coding framework (see Figure [T] 
for an illustration of the system). In this paper we focus 
on a simple case where the side-information is a deleted 
version of the source sequence. Consider a binary sequence 
of length n denoted by X" = (Xi, . . . ,X„). Consider another 
binary sequence of length n called deletion pattern, denoted by 

'This material is based upon work supported by the US National Science 
Foundation (NSF) under grants 23287 and 30149 and by a gift fi'om 
Qualcomm Inc.. Any opinions, findings, and conclusions or recommendations 
expressed in this material are those of the authors and do not necessarily reflect 
the views of the NSF. 



D" - (Di, . . . ,D„), which determines how X" is to be deleted. 
The outcome of the deletion process, denoted by y{X", D"), is 
derived from X" by deleting the bits at those locations where 
the deletion pattern is 1. Here is an example: 

X" = (0,1,0,1,1,0,1,0,1,0) 
D" = (0,1,1,0,0,0,1,1,1,0) 
y(X",D") = (0,1,1,0,0). 

Note that the deletion pattern D" tends to have bursts of 
consecutive I's, which lead to bursty deletions. The original 
files X" and the deleted files y{X",D") are available to the 
encoder and the decoder, respectively. The encoder sends a 
message to the decoder, so that the latter can reconstruct 
(synchronize) the original files X" with an error probability 
that is vanishing when n goes to infinity. The objective of 
this work is to characterize the minimum rate of the message 
defined as the minimum number of bits per source bit. 

The problem of synchronizing edited sequences has been 
studied by |[T], ||2l under the assumptions (1) the decoder is 
not allowed to make any error, and (2) the number of edits is a 
constant that does not increase with the length of the sequence. 
Upper and lower bounds of the minimum number of communi- 
cation bits were provided as functions of the number of edits 
and the length of the sequence. In Q, an interactive, low- 
complexity and asymptotically optimal scheme was proposed. 
In comparison, in this paper, we consider on information 
theoretic formulation allowing a positive probability of error 
that vanishes as n increases. This assumption allows us to 
use additional techniques like random binning to improve the 
minimum rate. Unlike in assumption (2), we consider the case 
that a vanishing fraction of source bits, rather than a constant 
number of bits, is deleted, to get which makes the problem 
harder and more realistic. 

In this paper, we characterize the minimum rate in terms of 
the limit of the conditional entropy of the source sequence 
given the side-information. We interpret the minimum rate 
as the amount of information in the deleted content and the 
locations of the deletions, subtracting the uncertainty of the 
locations given the source and side-information. We refer to 
the latter as "nature's secret". This is the information that 
the decoder will never find out even if it knows the source 
sequence and the side-information exactly; it represents the 
over-counting of information in the locations of the deletions. 
For example, if X" = (0, 0) and y{X", D") = (0), the decoder 



will never know and never needs to know whether the first 
bit or the second bit is deleted. Therefore the information 
about the precise location of the deleted bit is over-counted and 
should be subtracted. For small deletion rate and geometrically 
distributed burst length, the minimum rate is computed up to 
the precision of two leading terms. 

If the deletion pattern D" is independent and identically 
distributed (iid), X" and y{X",D") are the input and output of 
a binary iid deletion channel (see ID and references therein). 
In this case, the problem of characterizing the minimum rate 
to reconstruct iid uniform source sequences in the distributed 
source coding problem is closely related to the evaluation of 
the mutual information across the deletion channel with iid 
uniform input distribution. For small deletion probability, the 
second and third order termsQ of the channel capacity are 
achieved by iid uniform input distribution and are computed 
in ||5] Lemma III.l]. In this paper we consider the asymp- 
totic expansion of the minimum rate for the general bursty 
deletion process where the deletions are correlated over time. 
In the special case of iid deletion process, the expansion in 
Theorem [T] reduces to JS] Lemma III.l]. Note that in the 
source coding problem, the constant term becomes zero, which 
means that the second and third order terms of the channel 
capacity correspond to the first and second order terms of 
the minimum rate. Therefore, although it is mathematically 
equivalent to evaluate the these terms for the source coding 
and channel coding problems, from the practical point of 
view, the evaluation is more important for the source coding 
problem than for the channel coding problem. See Remark |3] 
for detailed discussions. 

When we generalize the iid deletion process to bursty 
deletion process, new techniques are introduced. The most 
interesting technique is the generalization of the usual concept 
of a "run". We view the sequence (1, 0, 1, 0, 1, 0) as a run with 
respect to deletion bursts of length two, because deleting two 
consecutive bits from that sequence always results in the same 
outcome sequence (1,0, 1,0). 

The rest of this paper is organized as follows. In Section HH 
we formally setup the problem and provide a preview of the 
main result. In Section we provide information theoretic 
expressions of the minimum rate for general parameters of the 
deletion pattern. In Section |IV] we focus on the asymptotics 
when the deletion rate is small and compute the two leading 
terms of the minimum rate. All the proofs are provided in the 
appendices. 

Notation: With the exception of the symbols R,E, C, and J, 
random quantities are denoted in upper case and their specific 
instantiations in lower case. For / 6 Z, denotes the 
sequence (V,, . . . , Vj) and V denotes V[. The binary entropy 
function is denoted by /i2(0- All logarithms are base 2. The 
notation {0, 1}" denotes the «-fold Cartesian product of {0, 1), 
and {0, !}♦ denotes {{JkezAO, 1}*) Ul®}- 

^For small deletion probability d, the first order term of the channel capacity 
is 1, the second order terni is Q{d\ogif), and the third order term is &(d). 



II. Problem Formulation and Main Result 

A. Problem formulation 

The source sequence X" - (Xj , . . . , X„) e {0, 1 )" is 
iid Bernoulli(l/2). Let a,/3 e (0,1). The deletion pattern 
(Do, Di, . . . , Ai+i) is a two-state stationary Markov chain 
illustrated in Figure |2] with the initial distribution po^ ~ 
Bemoulli(c/), where d := /3/{a+/3) and transition probabilities 
P(A = 0|A_i = !)=!- P(D/ = = 1) = a and 

P(D; = = 0) = 1 - P(D/ = 0|D,_i = 0) = for all 

/ = 1,2, . . . ,n + I. Note that the initial distribution po„ is 
the stationary distribution of the Markov chain. The deleted 
sequence y{X",D") e {0, 1)* is a subsequence of X", which is 
derived from X" by deleting all those X,'s with D, = 10. The 
length of y{X", D"), denoted by L^., is a random variable taking 
values in {0, 1, ... , n). For / < Ly, 7, denotes the i-th bit in the 
y{X",D") sequence. A run of consecutive I's in the deletion 
pattern is called a burst of deletion. Since /3 is the probability 
to initiate a burst of deletion, it is called the deletion rate. 
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Fig. 2. Markov model for the deletion pattern process {D, };>o. Dj = I 
means Xj is deleted; Dj = means X, is not deleted. 

The source sequence X" is available to the encoder and the 
deleted sequence y{X",D") is available only to the decoder 
as side-information. The deletion patterns D" is available to 
neither the encoder nor the decoder The encoder encodes X" 
and sends a message to the decoder so that the decoder can 
reproduce the source with high probability. 

Remark 1: If/3 = l-ff = d, D" becomes iid, and the relation 
between X" and y{X", D") can be modeled as an iid deletion 
channel with deletion probability d. In this paper we consider 
the Markov deletion pattern to emphasize the bursty nature of 
the deletion process in the source coding problem. 

The formal definitions of a code and an achievable rate are 
as follows. 

Definition 1: A distributed source code for deletion side- 
information with parameters («, |A1n|) is the tuple {fn,gn) 
consisting of an encoding function /„ : {0, 1 }" — > M,, and 
a decoding function g„ : M„ x {0, 1}* ^ {0, 1)". 

Definition 2: A real number R is called an achievable 
rate if, there exists a sequence of distributed source codes 
{if„,gj,)]j,>i for deletion side-information with parameters 
(n,\M„\) satisfying lim„^^¥{X" + g„(UX"),y{X'\D''))) = 

and limsup„^^(l/«)log|M,l < ^■ 

The set of all achievable rates is necessarily closed and 
hence the minimum exists. The minimum achievable rate is 

'Do and Dh+i do not detemiine the deletion of any source bit and do 
not play a role in the problem formulation. However, they are used in the 
information theoretic expressions in Sections |lll] and IIVI 



denoted by R,„i„. The focus of this paper is to characterize 
Rmin, especially for small /3. 

B. Main result 

In Section |III] we express 7?„„„ using information theoretic 
quantities when the parameters a and yS take arbitrary values. 
Unfortunately, we cannot provide an explicit expression of 
Rinin as a function of a and /?. Hence we focus on asymptotic 
regimes in Section |IV] when [} is small. 

Since the main difference between the erasure process and 
the deletion process is that the locations of the erasures are 
explicit but those of the deletions are not, it is interesting to 
focus on a regime where the amount of information to describe 
the locations of the deletions should play a significant role in 
the minimum rate. When a is vanishing and the length of 
bursts of deletions is increasing, for each burst, the number 
of bits to describe the deleted content increases linearly with 
respect to the length of the burst, but the number of bits 
to describe the location and length of the burst increases 
logarithmly. Therefore the regime with a vanishing a is not 
interesting. On the contrary, when a is fixed, the length of a 
burst is of order 0(1) and we have an interesting regime. In 
this case, we evaluate R,„i„(a,P) as follows. 

Theorem 1: When a is fixed, for any e > 0, we have 

R„nn{a,l3) = -pXogP+pi^-^-^^ + loge - cj + 0(li'-% 

(2.1) 

where C = YZ\ 2"'"'nog/ ^ 1.29. 

The proof of Theorem [T] based on Lemmas [T] and |2] and is 
provided in Appendix ICl Detailed discussions about the proof 
techniques are given in Section IIV-BI 

Remark 2: The dominating term on the right side of (12.1) 
is -/3log/3, and the second leading term is of order &(J3). 
Since - log/? tends to infinity slowly as decreases to zero, 
in practice these two terms are often in the same order of 
magnitude. Therefore we need to evaluate both of them. 

Remark 3: In Q, the authors evaluated the mutual infor- 
mation across the iid deletion channel with iid Bernoulli(l/2) 
input as 

lim -IiX";y{X",D")) ^ I + dlogd - d{\og2e - C) + 0{d^-% 

which implies that 

lim -H(y{X", D")\X") = -dlog d + d{\og2e -C) + 0{d^-'). 

This expression should be compared with (12.1) in the special 
case that the deletion process is iid, which requires [5 - 1 - or = 
d. Under this condition, ( 12. Il l also has the same two leading 
terms -c/log + ii(log2e - C). Therefore in the special case of 
iid deletion process, (12. Il l is consistent with the result in IS). 

Remark 4: Theorem [T] implies that when the input distribu- 
tion is iid Bemoulli(l/2), the mutual information across the 



bursty deletion channel is 

lim -I{X"\y{X'\D")) 

1 1 + h2{a) \ 
= 1 + /? log/? - yS ^ + log e - C 1 + 0{Ji--'){2.2) 

In |6l, Dobrushin showed that the channel capacity of the 
iid deletion channel is lim„^oo(l/«)max^j.„ I(X";y(X",D")). If 
this expression can be extended to the bursty deletion channel 
where the deletion pattern process is a Markov chain, then 
(|2.2| i provides an asymptotic lower bound for the capacity of 
the bursty deletion channel for small values of /3. 

III. Information Theoretic Expression for General a and /3 

We can write the minimum achievable rate R,„i„ as the 
following information theoretic expression. 
Lemma 1: 

Rmin = lim -H{X"\y{X",D"\DQ,Dn^x). 

«— >oo 11 

The proof of Lemma [1] is given in Appendix |A] 
The structure of the proof is as follows: (1) we 
show that the Hmit lim„^„(l/«)i/(X«|y(X",Z)«),£)o,Z)„+i) 
exists, (2) using the information-spectrum method Q 
Section 7.2], we have R„u„ = 'H(X"\y{X" , D")) : = 
p4imsup„^^(l/n)log(l//7x»|v(X",z)-)(X"|>'(X«,£)«))), which is 
the conditional spectral sup-entropy, (3) we show that 
H{X"\y(X",D'')) = \im„^U\ln)H{X"\y{X\D"),Do,Dn^i). 
The techniques we use in step (3) are similar to those Do- 
brushin used in [61, where the capacity of the iid deletion chan- 
nel is characterized by lim„^oo(l/n) maxp^„ I{X";y{X",D")). 

In Lemma |2l the information theoretic expression of the 
minimum rate is written in another way, which has a more 
intuitive interpretation as explained in Remark |5] 

Lemma 2: 

R,„i„^d + H(Di\Do)-E^, (3.3) 

where Eoo '■- lim„^oo£„, and E„ : = 
H(Di\Do,X",y{X'\D"),D„+i). 

The proof of Lemma |2] is given in Appendix [B] 
Remark 5: Lemma |2] expresses Rmin in terms of three parts, 
which can be intuitively interpreted as follows. The first term 
d is the fraction of deleted bits in X". It represents the amount 
of information per source bit in the deleted content, and thus 
the rate needed to send the deleted content. The second term 
is the entropy rate of the deletion pattern process, which is 
the rate needed to describe the locations of deletions. If the 
encoder knew the locations and sent them together with the 
deleted content, the decoder could reproduce X". However, 
this is excessive information. In fact, even if the decoder can 
correctly reproduce X", it can never know the exact deletion 
pattern. Therefore the uncertainty of the deletion pattern D", 
given X" and y{X", D"), is not required to be revealed in order 
to reproduce X". 

The uncertainty in the deletion pattern, given the source 
sequence and side-information is the nature 's secret, which is 
known only to an imaginary third party (nature) who generates 



the deletion pattern. Since nature's secret is not required to 
reproduce X", it should be subtracted from the message rate. 
Lemma |2] shows that nature's secret per source bit, which is 
the uncertainty in the whole deletion pattern D" normalized 
by «, can be expressed as Eaa, which is the uncertainty in only 
Di. An intuitive explanation is that, the uncertainty in each bit 
in D" is approximately the same, therefore the uncertainty can 
be represented by the uncertainty in only D\. 

IV. Asymptotic behavior of i?„,„, for small values of p 

In typical settings the number of edits is often much less 
than the file size. Since /? is the probability to start a burst of 
deletions, the asymptotic behavior of /?,„„, for small p is of 
special interest. 

A. Case 1: Few number of long bursts of deletion: a <S. l,/3 
1, and alj3 is fixed 

When a <s: 1,/? 1 and a IP is fixed, the number of bursts 
are much smaller than the length of the sequence, and each 
burst is so long that the overall fraction of deletion d - /3/{a + 
P) is a constant. 

On the right side of ( 13.3b . the first term is a constant. For 
any e > 0, the second term H{D\\Dq) - dh2(a) + (l -d)h2(P) - 
0(P^-% and the third term Eo, < H(Di\Do) = 0(J3^-'). 
According to Lemma |2] we have 

R„u„{a,P) = d + 0(P^-'). 

Intuitively speaking, if we have a small number of long bursts 
of deletion, the amount of information of the locations of 
deletions is orderwise less than the amount of information of 
the content of deletion. Therefore /?,„,„ is dominated by the 
rate needed to deliver the deleted content. 

A more interesting case is when all three terms of (13. 3t are 
comparable. 

B. Case 2: Few number of short bursts of deletion: a is fixed 
and /? <K 1 

When a is fixed and «: 1 , the number of bursts is much 
smaller than the length of the sequence. Since the length of a 
burst is drawn from a geometric distribution with parameter a, 
the expected length is of order 0(1). The overall proportion of 
deleted bits is d - p/(a+P) - pia + ®{jf). In this case, unlike 
in Case 1, the location information and "nature's secret" are 
comparable to the content information. Therefore we need to 
evaluate all three terms for this case. The three terms on the 
right side of (I3.3l l are evaluated as follows. For any e > 0, we 
have 

d = p/a + &{p^), (4.4) 

H{Di\Do) = -piogP + ^^^+pioge + 0(p^-%{4.5) 

a 

-£oo = -Cp + 0(J3^-% (4.6) 

where C = ZJ^i 2-'-'/log / ^ 1.29. Combining (gS through 
( |4.6l l gives Theorem [1] 

The proofs of (I4.4l i and (I4.5l l are trivial. The proof of (14. 6t is 
highly nontrivial and is the essence of the proof of Theorem [1] 



The complete proof of (|4.6l l is given in Appendix |C] In this 
subsection we explain only the intuition of ( 14.6b . 

Let us first consider the case that the deletion is not bursty 
{a - 1), i.e., no consecutive bits are deleted. In order to 
evaluate nature's secret we need to estimate the uncertainty 
in Di given X",y{X'\D"),Do and D„+i. The uncertainty is 
significant if the first run of X" is different from the first run 
ofy(X",D"). For example, if X" = (0,0,0, 1) and y(X",D") = 
(0, 0, 1), we know that one bit is deleted in the first run (first 
three bits) of X", but do not know which bit is deleted. The 
true identity of the deleted bit is nature's secret. Since there 
are three equally likely possible deletion patterns and only one 
leads to Di = 1, the conditional entropy of Di is /i2(l/3). The 
length of the first run of X" is L, a geometrically distributed 
random variable with parameter 1 /2. If one bit is deleted in the 
first run, the conditional entropy is Ii2{l/L). The probability 
that any bit in L bits is deleted is roughly Lp, therefore the 
average uncertainty is E[h2(l / L)Lp] = {ZT=ih2(l/l)2''l)p = 

(i:;:i2-'-i/iog/)/? = c/?g 

Let us now extend the discussion in the previous paragraph 
to the case of bursty deletions (a < 1). First, we need to 
generalize the usual definition of "run" to b-mn. 

Definition 3: For any b and / e Z^, a sequence 
(xi, . . . , Xb+i-i) is called a b-mn of extent I if for all i,j 
satisfying (i = j mod b), xt = xj holds. 

For example, (1,1, 1,1,1) is a 1-run of extent 5, and 1-run 
is the usual definition of a run. The sequence (1,0, 1,0, 1) is 
a 2-run of extent 4. Note that there are I different ways to 
delete b consecutive bits in a sequence of length I + b - I. A 
special property of a b-mn of extent / is that, all the I ways 
of deletion result in the same outcome. For example, all four 
ways of deleting two consecutive bits in (1,0, 1,0, 1) lead to 
the same outcome (1,0, 1). This observation is formally stated 
in the following fact. 

Fact 1: Let x''^'"' be a b-mn of extent I. Let d,y, denote the 
sequence of (/ - 1) O's followed by b I's, then followed by 
(I - i) O's. Then y{^^'-\Aij,) is the same for all / = 1, . . . , Z. 

Definition 4: For any b eZ^, the first b-mn of a sequence 
(xi,...,x„) is the longest segment starting from xi that is a 
b-mn. 

For example, the first 2-run of (0, 1,0, 1, 1) is (0, 1,0, 1). 

Now let us consider the uncertainty in Di given 
X",y{X",D"),Do and D„+i through an example. If we know 
that a burst of 2 bits is deleted in X" - (0,1,0,1,1) to 
produce y{X" , D") = (0, 1, 1), we know that the deletion occurs 
within the first 2-run, i.e., (0,1,0,1). Since there are three 
indistinguishable deletion patterns, (1, 1,0,0,0), (0, 1, 1,0,0), 
and (0,0,1,1, 0), among which only the first one satisfies 
Di - 1, the conditional entropy of Di is /!2(l/3). 

For any b, the extent of the first b-mn, L, is a geometrically 
distributed random variable with parameter 1/2, as in the non- 

''in this section we only provide an intuitive explanation using a simplified 
case that there is only one burst of deletion. In a rigorous proof it is shown 
that with high probability the first burst of deletion can be isolated from the 
other bursts so that the general case is reduced to the simplified case. See 
Appendix |C] for details. 



bursty case. This fact can be seen by sequentially generating 

Xi,X2, For arbitrary realization of - x*, X'' always 

belongs to the first b-mn. If the first b-mn has been extended 
to the (/ - l)-th bit, it will be extended to the /-th bit if X, = 
Xi-h, which occurs with probability ^. Therefore the extent 
of the first b-mn is a geometrically distributed variable. If 
one burst of b is deleted in the first b-mn, the conditional 
entropy of Di is h2{\IL). Since given the length of burst b, 
the probability that any deletion pattern among all L possible 
deletion patterns occurs is roughly Lp, the average uncertainty 
of E[/!2(l/i)iyS] = Cp. Note that the result is the same for all 
b. In other words, nature's secret is always C =s 1.29 bits per 
burst, regardless of the length of burst. 

Remark 6: Since nature's secret is Cp + 0(jf-^'^) for any 
given value of the length of burst b e Z^, the fact that nature's 
secret averaged across different possible values of b is Cp + 
0{p^^^), regardless of the distribution of the length of a burst 
of deletions. This implies that Theorem [T] may generalize to 
more general deletion processes beyond the two-state Markov 
chains. In order to draw a rigorous statement, however, one 
has to revisit Lemmas [T] and |2] and prove them for the general 
setup. 

V. Concluding Remarks 

We studied the distributed source coding problem of syn- 
chronizing source sequences based on bursty deletion side- 
information. We evaluated the two leading terms of the 
minimum achievable rate for small deletion rate. Directions 
for future work include considering insertions in addition to 
deletions, and evaluating the leading terms of the capacity of 
the bursty deletion channel. 

Appendix A 
Proof of Lemma [T] 

(1) We first show that R„ := {I ln)H{X"\y{X" ,D"),Do,Dn+-C) 
converges as « — > oo, so that the limit in the statement of 
Lemma [U is well defined. 

For all OT 6 (1, ...,«- 1}, we have 



nR„ 



H{X"\yiX",D"),Do,D„^i) 



> H{X"\y(X"\ D'"),y{Xl^„Dl^,), Do, D„^,) 

> H(X"'\y(X"', D'"),y{Xl^„Dl^,), Do, £»„+i, D^+i) 

D'"),3;(x;;^i, Do, a,,) 

HiX'"\yiX"\D"\Do,D,„,i) 
+H{XlJy(Xl^„Dl^,),D„,uD,„) 
= H{X'"\y{X'\D"\Do,D„^i) 

+H{X"-"'\y{X''-^"', D"-"'), Do, D„_,„) 

where step (a) holds because the tuple 
(y(X'«,D"'),y(^^+i,£>",+i)) determines y(X",D"), 

and step (b) holds because the Markov chains 
(y{Xl^^,Dl^^),D,„i) - A«+i - {X"',y(X"',D"'),Do) and 
(y{X'",Dn,Do) - D,„ - (X«^;,y(X«^,,D«^;),A,+i) hold. 
Therefore the sequence {ni?„}„eN is superadditive. By Fekete's 
lemma iU, the limit lim„^oo Rn exists. 



(2) Using the information-spectral version 
of the Slepian-Wolf theorem ||7] Section 
7.2], we have /?„„■„ = H(X"\y(X" , D")) : = 
p- Umsup„^<,(l/«)log(l/;9x"|,(X",D")(^"l3'(^",£>"))). In the 
rest of this appendix, for any random variables A, B, 
we abbreviate pa{A) and pA\BiA\B) to p{A) and p{A\B), 
respectively, to avoid cumbersome notations. 

(3) Now we show that the sequence of random variables 
{l/n)log{l/p{X"\y{X",D"))) converges in probability to the 
limit lim„^oo R,,- 

We introduce a segmented deletion process as follows. 
Let k > 3 he the length of a segment. Let g := [n/k] 
be the number of complete segments and I ;= n - gk 
be the length of the remainder Consider the outcome of 
a segmented deletion process as follows: let z{X",D") : = 
(ZiL,ZiM,ZiR, . . .,Zgi,ZgM,ZgR,Zre,„ainder) be a vcctor with 
{3g + 1) components, where V/ - l,...,g, Z,x : = 

y{X(i-i)k+i,D^i-i)k+i), ZiM 3'(^(fri')i:+2'-^(/-iV+i-'' -^'^ 
y{Xik,Dik), and Z,,„™„j„. := yCX^^+pZ^^i+i). From 
we can find out how many source bits are deleted in each 
segment and the remainder, and whether the first and last bits 
of each segment are deleted. The sequence y{X",D") can be 
obtained by merging all the {3g+ 1) components of z{X",D"). 
Therefore the sequence z{X",D") contains more information 
than y{X",D"). We will first fix k and let « go to infinity. 
Then we increase k to prove the final result. 

The statement to be proved is based on the following three 
facts. 

Fact 2: For any k > 3, n and any 6 > 0, there exists a 
function e\{k) satisfying lim^^coeiC^) = 0, so that 



log- 



1 



-log- 



1 



> <5 < 



piX"\yiX", D")) '"^ p{X"\z{X", D")) 

Fact 3: For any k and any 5 > Q, there exists a function 
eiik) satisfying lim,i^oo eiik) = 0, so that as « — » oo, 

n '^^ p{X"\z{X",D")) 
-jH{Xt'\y{Xt'.. 



D\-\D,,Du) 



> 5 



Fact 4: 



ei{k) 



hm -H{X\-^\y{X\-\ 

1 



D'-'lDi. 



= lim jH(X'\y{X'', d\ Do, A+i). 

Proof of Fact^ 

Since y{X",D") can be determined by z{X",D"), there 
exists a function 4>„ such that y{X",D") = (f>„{z{X", D")). 
For any realization of z{X",D") - z, we have 
¥(ziX",D") = z) < ¥(y(X'',D") = 4>„(z)), which implies that 
(l/«)logP(z(X",D") = z) - (l/«)logP(y(X",D") = Mz)) < 
always holds. Let Lz be the vector of (3g-i-l) components 
representing the lengths of all the components of z{X",D"). 
Then we have 



- log p(y(X'\D")) - - log p(z(X", D")) 



-\og p(y(X",D")) 
n 



-\og p{z{X'\D")) 
n 



1 



< 



-i-Hiy{X'\D")) + H{z{X",D"))) 
n 

-Hiz(X",D")\y{X'\D")) 
n 

-H(Lz\y(X'\D")) 
n 

-H(Lz) 
n 

< -(3§+l)log^ 
n 

4 log A: 



By Markov's inequality, 

^ \\ogp{y{X",D")) - \ogp{z{X",D"))\ >6^< 

Using the same argument we also have 

- \\ogp{X'\y(X'\D")) - \ogp{X'\z(X\D"))\ > 5\< 

Combining the last two inequalities completes the proof of 
Fact HI ■ 

Proof of Fact\3} 

Let Zb '■- (Zi£,Zi«, . . . ,Zgi,Zgii,Zremaim1er)- Then 

-\og p(z(X",D")) 
n 

(f) 1 1 
= - log p{Zb) + > - log /7(Z,m|Zb) 
n n 

;=1 

- log p{Zb) + y - log p{ZiM\ZiL, ZiR), (A. 1) 
n ^ n 

where step (c) holds because given Zb, ZiM,---,ZgM are 
conditionally independent, and step (d) holds because D" is 
a Markov chain. 

Since the expectation of the first term of ( lA.l) is equal to 
{l/n)H{ZB) < {2g + l)/n\og3, by Markov's inequality we have 
P((l/«) logpiZB) >6)< (2g + I) log 3/{nS). 

Due to the law of large number, as n oo, which 
implies g ^ oo, the second term of (lA.ll i converges to 
(l/k)H(y(X'^-\D'^-^)\Di,Dk) in probability. 

Therefore we have: for any k and « — » oo. 



Using the same argument we also have 
1 



>6 



n p(X",z{X",D")) 

-^-H{X'^-\y{X\-\D\-'\DuD,) 



Combining the last two inequalities completes the proof of 
Fact [3 ■ 

Proof of Fact^ Fact |4] holds because (i) Px\-^ d''-^ - Px' ^d'-^ 
and (ii) - 2)/^ — > 1 as ^ ^ oo. ■ 
Combining Facts |2] and |3] we have; for any fixed k and 6, 

as n — > oo, 

1 



n p(X"\y{X'\D")) 

~H{X'^-\y{Xl-\D\-\D,,D,) 



> 6 



(A.2) 



for some ei{k) which vanishes as k increases. By choosing 
a large enough k, the right hand side of (IA.2b can be made 
arbitrarily small. Combining (IA.2t and Fact H] the sequence 
of random variables (l/«)log(l//:i(X"|y(X",D"))) is shown to 
be converging in probability to the limit lim„^oo/?„. 

Combining (1), (2) and (3) we have R,„i„ - 
Hm„^oo(l/«)//(X«|3'(X", £>"), Do, A,+i). 

Appendix B 
Proof of Lemma|2] 

We will first introduce a sequence {/„)„eN and show that 

Lemma 3: For all « € Z^, let J„ := d + 
{l/n)H(y{X",D")\X'\DQ,D„+i). Then we have 

lim^i— >oo Jn — Rmin- 

Proof: We have 



R 



1 



,„,„ = \im-H{X"\y(X",D"),DQ,D„^i) 



= lim -[H{X''\Do,D„^i) + H{y{X'\ D")\X'\ Do, A,+i) 



-H(yiX'',D")\Do,D„^i)] 
= 1 + Urn -H(y{X", D")\X", A), A,+i) 

n-*°° n 

lim -{HiLy\Do,D„^i) + H(y(X",D")\Ly,Do,D„^i)). 



Since 



for some CjC^) which vanishes as k increases. 



>6\< 



e'Jk) < lim -H{L,\Dq, D„+i) < Hm - log(« + 1) = 0, 

6 

we have \m\„^oo -H{L^\Dq,D„+\) - 0. Since given L,, = 
I and given (Do,D„+i) the sequence y{X",D") is an iid 



Bemoum(l/2) sequence, HiyiX" , D")\L, = l,Do,D„+,) = / 
holds. Therefore H(y(X", D")\Ly, Dq, £)„+,) = E(Lv) and hence 

lim -H(y(X",D")\Ly,Do,D„^i) = hm -E[Lj.] 



a+/3 



In conclusion. 



1 



R„,„ = 1 + lim -/fCyCZ", D")\X", Do, - (1 - t/) 



= Hm id + -H{y{X", D"\X\ Dq, £)„+i)) 
- Hm J„, 



which completes the proof of Lemma |3] ■ 
Now let us use Lemma [3] to prove Lemma |2] 
Expanding Do, in two ways, we 

have 

Do, £)„+i) - H(Di|Z",y(X«,Z)"), Do, 
= H{y(X\ D")\X", Do, D„+i) - H(y{X'\ D")\X", Do, Dj, D„+ 

(B. 



The first term on the left side of ( IB. 3) is equal to 
H(Di\Do, D„+i). The second term on the left side of ( IB.3b 
is denoted by £„. The first term on the right side of (IB. 3) is 
equal to «(7„ - ^f). The second term on the right side of (IB. 3) 
is: 

i/0;(X«,D")|r',Do,Di,D„+i) 
= H(y(X'\D")\X'\DuD„^i) 
- i/0;(X«,D«)|X",Di = l,D„+i)pB,(l) 

+HiyiX",D")\X",Di = 0,D„+i)pB,(0) 
= H{y{Xl,m\XuXl,D, = l,D„+i)/,B,(l) 

+i/(Xi,y(X^,D^)|Xi,X^',D, = 0,D„+,)pD,(0) 

H{y(X"2,D"2)\X"2,Di = l,D„+i)/7B,(l) 
+//(y(X^,D^)|X?,Di = 0,D„+i)/7o,(0) 

= //(3;(X2",D^)K',D,,D„+i) 

= H(yiX"-\D"-')\X"-\Do,D„) 

= (n- l)(y„-i -fl!), 

where step (e) holds because Xi is independent of 
(D"^\X'^,y(X''D")). Therefore (Ib3]i becomes 



Proof: (1) For all n > 2, we have 

E„ = //(Di|r',3;(X",D"),Do,D„+i) 

> H{Di\X",y{X",D"),Do,D„,D„^i) 
= //(Di|X",3;(X",D"),Do,D„) 
= H{Di\X",y{X",D"),Do,D, = l)pz)„(l) 

+H{Di\X",y(X",D"),Do,D„ = 0)/9o„(0) 
= //(Di|r'-i,X,„y(r'-',D"-'),Do,D„ = l)po„(l) 

+i/(Di|X"-',X,„y(X"-',D"-'),Do,D„ = 0)/;d„(0) 
= H(Di \X"-^ , , D""' ), Do, D„) 

Therefore {£„}„>! is nondecreasing. 

(2) Since for all n. En > 1 holds and {£„)„> i is nondecreas- 
ing, = Hm„^ Ef, exists. ■ 
By LemmalU the left side of ( IB .41 ) converges to H(Di\Do)- 
Eoo as n — > 00. Since ( IB.4l i holds, the right side also converges 
and the Hmit is (Hm„^co n{Jn - Jn-i)) + Rmin - d. Since {J„]„>\ 
i)-is a converging sequence and the lim„^con(/n - Jn-\) exists, 
3)lim„^oon(/n - Jn-\) - 0. Therefore in the limit as n 00, 
(IB.4b becomes 

H(Di\Do) - Eoo = Rmin - d, 
which completes the proof of Lemma |2] 



Appendix C 
Proof of Theorem[T] 

When cf is a fixed constant and [3 <s: 1, it is easy to verify 
that the first two terms of (13.3b are 



P ^ ah2ij3) ^ /3h2{a) 



a+p a+p a+p 
- -piogP+pi^^^^+loge] + 0(p^-% 



/f(Di|Do,D„+i) - E„ = n(y„ - y„_i) + /„_i - d. 



(B.4) 



Now let us take the limit as n ^ 00 on both sides 
of iBAi . Because of mixing of the Markov chain {D,),>o, 
the distribution Pd„+,|Do,Di('Mo, 1^1) converges to the stationary 
distribution regardless of the initial values (d(),di) as « goes 
to infinity. Therefore lim„^c«/^(Di|Do,D„+i) = H(Di\Do). For 
the second term on the left side of (IB .4b . Lemma |4] guarantees 
the convergence of {£„}„> 1. 

Lemma 4: (1) The sequence {E„}„>i is nondecreasing. (2) 



d + H(Di\Do) 



for any e > 0. We will show that the third term of (13.3b £00 - 

cp + o(p'--'). 

Let us first define "typicality" of the deletion pattern. Since 
£00 is the conditional entropy of Di, which is more relevant 
to the first a few bits of Dg, the typicality of the Dj] concerns 
about only the first a few bits. 

Definition 5: Let k = max{6, 6/(log(l - a))]. For n > 
-klogp, the deletion pattern Dj, is typical if the following 
two conditions hold. 

1) There is at most one run of I's in (Do, . . . , D^kXogp)- 

2) There are no more than (-A:/31ogjS) I's in 

(Do, . . . ,D_A.iog/3). 

Lemma|5] states that the deletion pattern is typical with high 
probability. 

Lemma 5: For any e > 0, the probability that DJJ is typical 
is at least 1 - 0(0^-^). 

Proof: Since any deletion pattern that has r runs of I's in 
(Do, . . . ,D-k\oop) occurs with probability 0{p'') and there are 
no more than (-k\ogp)-'' such patterns, P((Do, . . . , D_,iiog/s) 
contains r runs of I's) = (9(y6' for any e > 0. Hence 



condition 1) of Definition |5] holds with probability 1 -(9(jS^ 0- 
Given that condition 1) holds, condition 2) is violated if there 
is a burst of deletion longer than (-A;/3 log/?), which occurs 
with the probability (9((1 - a)-*/3iog/5-) _ 0(0^). In conclusion, 
¥{Dl is typical ) = 1 - 0(0^'^) for any e > 0. ■ 
Let the indicator random variable T :- 1 if DJj is typical and 
r := otherwise. Lemma |5] implies that p7-(0) = 0{ff^'^), Ve > 

0. Lemma |6] states that we can focus on the typical case T - \ 
in order to evaluate £00 to the precision of 0(0^^'^). 

Lemma 6: 

Eao = \\mH{Di\X",y{X",D"),Do,D„^uT ^ \)pt{\)+O{0-'). 

Proof: For all n > -klog/3, we have the following lower 
bound of E„ 

E„ > H(Di\X",y(X",D"),Do,D„^uT) 

> H{Di\X'\y{X",D"),Do,D„^uT = I)pt(I), 

and the following upper bound 

E„ < H{Di,T\X",y(X'\D"),Do,D„^i) 
= HiDi\X",yiX",D"),Do,D„^uT) 
+H{T\X",y(X",D"),DQ,D„^i) 

< H{Di\X'\y{X",D"),Do,D„,i,T = DpAD 
+H{Di\X",y{X",D"),Do,D„^uT ^ Q)pTiQ) + H(T) 

< H{Di\X",y{X'\D"),Do,D„^uT = 
+Pt{Q) + H{T) 

= H{Di\X'\y{X'\D"),Do,D„^i,T = 1)Pt(1) + 0(J3^-'). 

Taking the limit as « — > 00 completes the proof. ■ 
For all « > -klog/3, we have 

H(Di\X'\y(X",D"),Do,D„^uT ^ 1)Pt(1) 

= H(Di\X'\y(X",D"),Do = = l)/^o„,r(l, 1) 

+HiDi\X",y{X",D"),Do = 0, = 1)pd,.t(0, 1). 

We will separately analyze the following two cases; (1) Dq = 

1, r = 1 and (2) Do = 0, T = L 

. Case (1): Do = l,r = L In this case we check whether 

Ml := otherwise. Note that Mi is determined by X" 
and y(X",D"). 

- Case (1.1): Do = l,r = I, Mi = 0. There exists at 
least one 1 in Dj*'"^^. Since Do = 1 and there is at 
most one run of 1 in Dq*'°^^ in a typical deletion 
pattern, Di = 1 must hold. Therefore H{Di\Dq - 

i,r = \,Mi = 0) = 0. 

- Case (1.2): Do = l,r = l,Mi = 1. In this case, 
both Di - Q and Di = 1 are possible. Given Do = 
1, r = 1, if Di = 0, then for all / = 2, . . . , -klog/3, 
Di = 0, which implies that = y-Aiog/* if 
Di = 1, then for all / = 1, -/t log/3, Xj and Yj 
are independently generated fair bits, hence the event 
Xi = y, occurs with probability 1 /2. Since events 
{Xi = Yi]i are independent across /, P(Mi = l|Di = 



l,Do = l,r = 1) = (l/2)*'°g^ = 0(0'). Since P(Di = 
l|Do = l,r = 1) = 0(1) and P(Di = 0|Do = l,r = 
1) = 0(1), by Bayes' rule, we have P(Di = l|Do = 
l,r = l,Mi = 1) = 0(0). Therefore H(Di\Do = 
l,r = l,Mi = l)^O(0-'),Ve>O. 
In conclusion, the contribution of Case (1) to £00 is 

//(Di|X",3;(r',D"),(Do,r) = (1, 1), D„+i)/5o„,r(l, 1) 
- H(Di\X",y(X",D"),(Do,T) = (1, 1), D„+i, Mi) 

xpDo.AlA) 

= 0(0-'). 

. Case (2): Do = 0, T = 1. In this case we will first check 
whether X-*/^'"^^ = y-*/3iog/3 i jf jjjgy 

and M2 := otherwise. 

- Case (2.1): Do = 0,r = 1,M2 = 1. By the same 
argument as in Case 1 for Mi = 1, we have P(Di = 
l|Do = 0,r = 1,M2 = 1) = 0(J3^), and H(Di\Dq = 
0,r = 1,M2 = 1)pd„.tM0, 1, 1) = 0(0-'), "ie > 0. 

- Case (2.2): Do = 0, T = 1, M2 = 0. We try to find a 
length-(-A:/3 logyS) segment in y-'-'og/^ that matches 
-^-2k'/3\ogi3+i- 'Siii'^s (i) ^2 - implies that at least 
one bit in the first -A:/31ogy6 bits is deleted and (ii) 
a burst of deletion in a typical deletion pattern is no 
longer than -k/3log/3, there must be no deletion in 
^-'ik/fiogfi+i' which implies that there must be at least 
one segment in Y'''^°sfi that matches '^_2A'/3k)g/s+r 
Define B := if there are two or more segments that 
match X o,'°ff „; and for b e Z^, define B .= b if 

there is a unique segment i'_2j(:/3ic)g/3+i-6 ^^^^ matches 
X l'?f f a , 1 with an offset b. 

* Case (2.2.1): Do = 0, T = 1, M2 = 0, B = 0. The 
condition Z? = requires at least (-A:/31ogy8) in- 
dependent bit-wise matches, each of which occurs 
with probability (1/2). Hence B - occurs with 
probability at most (I /2)-''^^'^°sP = 0(/3^). There- 
fore the contribution of Case (2.2.1) is H(Di\Dq = 

0, T = 1,M2 = 0,B = 0)pd„.t.m.,b(0, 1,0,0) = 
0(/3'-). 

* Case (2.2.2): Do = 0, T = 1, M2 = 0, B = e Z+. 
There must be a burst of deletion of length b tak- 
ing place in [)^^''/^^°sfi which causes the offset of b 

between ^_2/t/3k)o/3+i '■^^ matching segment in 
y(X", D"). Since the length of the burst is bounded 
by (-A:/3 log/3) in a typical deletion pattern, b < 
(-k/3 log/3) must hold. Since we can find a correct 
correspondence between a segment of X" to its 
outcome of deletion, the deletion process to the 
left of the segment is conditionally independent 
to the deletion process to the right. Therefore in 
order to evaluate the conditional entropy of Di 
we need to focus on the process to the left of 
the segment only. Hence the contribution of this 
case to is: ZbH(Di\X",y(X",D"),D„+i,T = 

1, Do = 0,M2 = 0,5 = b)pT,D„M,.B('^,0,Q,b) = 



Y,hH{Di\X"' ,y{X"' ,D"'),T = 1,Dq = 0,M2 = 
0, fi = b)pT,Da,Mi.B{^^^^^^b), where n' :- 
-2^/3 log y6. Lemma I2] will show that the contri- 
bution of Case (2.2.2) is Cp + 0(0^-^). This is the 
only case that is responsible for the leading term 
Cp in £00. 

As a summary, the contribution of all the cases (1.1), (1.2), 
(2.1), (2.2.1) to £00 is of order 0(0^-^). Lemma|7|will show 
that the contribution of Case (2.2.2) is CfS + 0(0'^), which 
will complete the proof of Theorem [T] 

Lemma 7: For «' := -2kll>\og[i, we have 
Y^-km°iP H{Di\X"' ,y{X"' ,D"'\{T,Do,M2,B) = (1,0,0,/?)) x 
/'r,D„.M,,B(l, 0, 0, b)^C/3 + 0(p--% 

Proof: Using the abbreviation Y :- y{X" ,D" ), we have 



^ H(Di\X"',Y,(T,Di),M2,B) = (1,0,0,/?)) 

b=l 

XpT.Do.M2.Bi^,0,0,b) 

-k/3\ogl3 

b=l jc-\y 

{T,Do,M2,B) = [1,0,0, b)) 

XPx'-',Y.T,D„M,,B(^'\y^Mb) (C.5) 

-AV3 log/3 

(T,Dq,M2,B) = (1,0,0,/;)) 

XPx"',K7-.Do,M3.b(^'' , <+i , 1 , 0, 0, b) 

-AV3 log/3 

xpx«',y.r,Do,B(^"' ' <+i . 1 . 0, /7) + 0(/?2)^ (^.6) 



where step (f) holds because of the following reason. Given 
(r. Do, M2, B) = (1, 0, 0, b), if Di = 1, then = 1 and D'^^^ = 
hold, which imply that Y - X"^^^^. Therefore the conditional 
entropy in (IC.5b is nonzero only if y = "^fe+p Step (g) holds 
because given y - x"j^^, the probability that M2 = is of order 
00"). 

Define lb{-) : {0, 1)* Z+ to be the length of the first b-mn 
of x" (c.f. Definitions[3]and|4|. In other words, for I = 1,2,..., 
lb(x") :=/ if (i) V/j </</? + Xi = Xi-b and (ii) xt+i + xi. 
Let d,;fc denote the sequence d" e {0, 1 )" satisfying that if 
j = i, . . . ,i+b-l, then dj - 1, otherwise dj - 0. Due to Fact[T] 
if lb(x"') = /, then y(x"',di,b) = Jc^'+i holds for all 1, ... , I, 
but does not hold for any / > /. Since given Dq = and 
D„+i = all / deletion patterns {d,,i}'^j occurs with the same 
probability a(l -Q;)''"'y6(l -/3)" and only one of them, di.^, 
satisfies Di = 1, we have H(Di\X"' = x"' ,Y = x'l^^^,l{X"') = 
l,(T,Do,B)^(l,0,b))^h2(l/l). 



For a sequence x" satisfying hix" ) - I, we have 

PX"',}^r,Do,B(-«"''<+l'l'0» 

/ 

= PxAx"')YjPn(\D,,D,J^i,bM 

i=l 

= /7;,,.(x"')to(l-t^)'"'ye(l-yS)«'-* 

for any e > 0. 

Therefore we continue (IC.6) as 

(IC6l l 

-Ar/31og;3„'-fe , , 

" Z Z Z J Px"'(-^"')to(l -£.)"-'/?(! -0(/?'-)) 

+0002) 

-*/3 log/3 , > 

= y yh2\-\2-'la{\-a)''-'l3{\-0(Ji'-')) + 0(J3-) 
b=i ;=i ^ ' 

= ZZ'^^ tP^'^'^^i -'^^'^'^^^i 
b=i 1=1 ^ ' 

= Z''2(j)2-'Z/6 + 0(;62-) 

where step (h) holds because k - max{6, 6/(log(l - a))] and 
n' = -2^/31ogy6, which guarantee that changing the limits of 
summations to infinity only leads to a change of order 0{ff), 
and step (i) holds because h2(\ll)2^'l - YaLi 2"'"'/log/. 

■ 
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